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Rifamvcin biosynthesis gene cluster 

^>^ifamycins form an important group of macrocyciic antibiotics (Wehrii. Topics in Current 
Chemistry (1971), 72. 21-49). They consist of a naphthoquinone chromophore which is 
spanned by a long aliphatic bridge. Rifamycins belong to the class of ansamycin antibiotics 
which are produced by several Gram-positive soil bacteria of the actinomycetes group and a 
few plants. 

Ansamycins are characterized by a flat aromatic nucleus spanned by a long aliphatic bridge 
joining opposite positions of the nucleus. Two different groups of ansamycins can be 
distinguished by the structure of the aromatic nucleus. One group has a naphthoquinoid 
chromophore. with the typical representatives being rifamycin, streptovaricin, tolypomycin 
and naphthomycin. The second group, which has a benzoquinoid chromophore. is 
characterized by geldanamycin. maytansines and ansamitocines (Ghisalba et al.. 
Biotechnology of Industrial Antibiotics Vandamme E. J. Ed.. Decker Inc. New York, (1984) 
281-327). In contrast to antibiotics of the macrolide type, the ansamycins contain in the 
aliphatic ring system not a lactone linkage but an amide linkage which forms the connection 
to the chromophore. 

The discovery of the rifamycins produced by the microorganism Streptomyces mediterranei 
(as the organism was called at that time, see below) was described for the first time in 1959 
(Sensi et al.. Farmaco Ed. Sci. (1959) 14. 146-147). Extraction with ethyl acetate of the 
acidified cultures of Streptomyces mediterranei resulted in isolation of a mixture of 
antibiotically active components, the rifamycins A, B. C. D and E. Rifamycin B. the most 
stable component, was separated from the other components and isolated on the basis of 
its strongly acidic properties and ease of salt formation. 



Rifamycin B has the structure of the formula (1) 
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Rifamycin B is the main component of the fermentation when barbiturate is added to the 
fermentation medium and/or improved producer mutants of Streptomyces mediterranei are 
used. 

The rifamycin producer strain was originally classified as Streptomyces mediterranei (Sensi. 
et al., Farmaco Ed. Sci. (1959) 14, 146-147). Analysis of the cell wall of Streptomyces 
mediterranei by Thiemann et al. later revealed that this strain has a cell wall typical of 
Nocardia, and the strain was reclassified as Nocardia mediterranei (Thieman et al. Arch. 
Microbiol. (1969), 67 147-151). Nocardia mediterranei has been reclassified again on the 
basis of more recent accurate morphological and biochemical criteria. Based on the exact 
composition of the cell wall, the absence of mycolic acid and the insensitivity to Nocardia 
and Rhodococcus phages, the strain has been assigned to the new genus Amycolatopsis 
as Amycolatopsis mediterranei (Lechevalier et al., Int. J. Syst. Bacteriol. (1986), 36, 29). 

Rifamycins have a strong antibiotic activity mainly against Gram-positive bacteria such as 
mycobacteria, neisserias and staphylococci. The bactericidal effect of rifamycins derives 
from specific inhibition otthe bacterial DNA-dependent RNA polymerase, which interrupts 
RNA biosynthesis (Wehrii and Staehelin, Bacteriol. Rev. (1971), 35, 290-309). The 
semisynthetic rifamycin B derivative rifampin (rifampicin) is widely used clinically as 
antibiotic against the agent causing tuberculosis, Mycobacterium tuberculosis. 

The naphthoquinoid ansamycins of the streptovaricin and tolypomycin group show, like 
rifamycin, an antibacterial effect by inhibiting bacterial RNA polymerase. By contrast, 
naphthomycin has an antibacterial effect without inhibiting bacterial RNA polymerase. The 
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benzoquinoid ansamycins show no inhibition of bacterial RNA polymerase, and they 
therefore have only relatively weak antibacterial activity, if any. On the other hand, some 
representatives of this class of substances have an effect on eukaryotic cells. Thus, 
antifungal, antiprotozoal and antitumour properties have been described for geldanamycin. 
On the other hand, antimitotic (antitubilin), antileukaemic and antitumour properties are 
ascribed to the maytansines. Some rifamycins also show antitumour and antiviral activity, 
but only at high concentrations. This biological effect thus appears to be nonspecific. 

Despite the great structural variety of the ansamycins, their biosynthesis appears to take 
place by a metabolic pathway which contains many common elements (Ghisalba et al. 
£ Biotechnology of Industrial Antibiotics Vandamme E. J. Ed., Decker Inc. New York, (1984) 
m 281-327). The aromatic nucleus for all ansamycins is probably built up starting from 
flj 3.amino-5-hydroxybenzoic acid. Starting from this molecule, which is presumably activated 
f as coenzyme A, the entire aliphatic bridge is synthesized by a multifunctional polyketide 
C3 synthase. The length of the bridge and the processing of the keto groups, which are initially 
• formed by the condensation steps, are controlled by the polyketide synthase. To build up 
JJ the complete aliphatic bridge for rifamycins, 10 condensation steps, 2 with acetate and 8 
P,. with propionate as building blocks, are necessary. The sequence of these individual 
condensation steps is likewise determined by the polyketide synthase. Structural 
comparisons and studies with incorporation of radioactive acetate and propionate have 
shown that the sequence of acetate and propionate incorporation for the various 
ansamycins takes place in accordance with a scheme which appears to be identical or very 
similar in the first condensation steps. Thus, from a common synthesis scheme of the 
ansamycin polyketide synthases (the rifamycin synthesis scheme), the syntheses of the 
various ansamycins sooner or later branch off. in accordance with their structural difference 
from the rifamycin structure, into side branches of the synthesis (Ghisalba et al., 
Biotechnology of Industrial Antibiotics Vandamme E. J. Ed., Decker Inc. New York, (1984) 
281-327). 

Because of the great structural variety of the rifamycins and their specific and interesting 
biological effect, there is great interest in understanding the genetic basis of their synthesis 
in order to create the possibility of specifically influencing it. This Is particularly desirable 
because, as explained above, there is much in common between the synthesis of 
rifamycins and that of other ansamycins. This similarity in the biosynthesis, which probably 
derives from a common evolutionary origin of this metabolic pathway, naturally has a 
genetic basis. 



The genetic basis of secondary metabolite biosynthesis essentially exists in the genes 
which code for the individual biosynthetic enzymes, and in the regulatory elements which 
control the expression of the biosynthesis genes. The secondary metabolite synthesis 
genes of actinomycetes have hitherto been found as clusters of adjacent genes in all the 
systems investigated. The size of such antibiotic gene clusters extends from about 
10 kilpbases (kb) up to more than 100 kb. The clusters often contain specific regulator 
genes and genes for resistance of the producer organism to its own antibiotic (Chater. Ciba 
Found. Symp. (1992), 171. 144-162). 

The invention described herein has now succeeded, by identifying and cloning genes of 
rifamycin biosynthesis, in creating the genetic basis for synthesizing by genetic methods 
rifamycin analogues or novel ansamycins which combine structural elements from rifamycin 
with other ansamycins. This also creates the basis for preparing novel collections of 
substances based on the rifamycin biosynthesis gene cluster by combinatorial biosynthesis. 

It was possible in a first step to identify and clone a DNA fragment from the genome of 
A. mediterranei, which shows homology with known poiyketides synthase genes. After 
obtaining the sequence information from this DNA fragment which confirmed a typical 
sequence for polyketide synthases it was possible to screen a cosmid library of 
A. mediterranei wWh specific DNA probes derived from this fragment in a screening program 
for further DNA fragments which are involved in the rifamycin gene cluster. As a result, the 
complete rifamycin polyketide synthase gene cluster was identified and subjected to 
sequence determination (see SEQ ID NO 3). The gene cluster comprises six open reading 
frames, which are referred to hereinafter as ORF A, B, C, D, E and F and which code for the 
proteins and polypeptides depicted in SEQ ID NOS 4 to 9. 

The gene cluster isolated and characterized in this way represents the basis, for example, 
for targeted optimization of the production of rifamycin, ansamycins or analogues thereof. 
Examples of techniques and possible areas of application available in this connection are 
as follows: 

. Overexpression of individual genes in producer strains with plasmid vectors or by 

incorporation into the chromosome. 
• Study of the expression and transcriptional regulation of the gene cluster during 

fermentation with various producer strains and optimization thereof through physiological 

parameters and appropriate fermentation conditions. 
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. Identification of regulatory genes and of the DNA binding sites of the corresponding 
regulatory proteins in the gene cluster. Characterization of the effect of these regulatory 
elements on the production of rifamycins or ansamycins; and influencing them by specific 
mutation in these genes or the DNA binding sites. 

• Duplication of the complete gene cluster or parts thereof in producer strains. 

Besides these applications of the gene cluster to improve production by fermentation as 
described above, it can likewise be employed for the biosynthetic preparation of novel 
M rifamycin analogues or novel ansamycins or ansamycin-like compounds In which the 
H aliphatic bridge is connected at only one end to the aromatic nucleus. The following 
4I possibilities come into consideration here, for example: 

5^ . Inactivation of individual steps in the biosynthesis, for example by gene disruption. 
01' . • Mutation of individual steps in the biosynthesis, for example by gene replacement. 

■ . Use of the cluster or fragments thereof as DNA probe in order to isolate other natural 
□ microorganisms which produce metabolites similar to rifamycin or ansamycins. 

M . Exchange of individual elements in this gene cluster by those from other gene clusters. 
3 . • Use of modified polyketide synthases for setting up libraries of various rifamycin 
analogues or ansamycins, which are then tested for their activity (Jackie & Khosia, 
Chemistry & Biology. (1995), 2, 355-362). 
. Construction of mutated actinomycetes strains from which the natural rifamycin or 
ansamycin biosynthesis gene cluster in the chromosome has been partly or completely 
deleted, and can thus be used for expressing genetically modified gene clusters. 
• Exchange of individual elements within the gene cluster. 

Detailed description of the invention 

The invention relates to a DNA fragment from the genome of Amycolatopsis mediterranei, 
which comprises a DNA region which is involved directly or indirectly in the gene cluster 
responsible for rifamycin synthesis; and the adjacent DNA regions; and functional 
constituents or domains thereof. 

The DNA fragments according to the invention may moreover comprise regulatory 
sequences such as promoters, repressor or activator binding sites, repressor or activator 
genes, terminators; or structural genes. Likewise part of the invention are any combinations 
of these DNA fragments with one another or with other DNA fragments, for example 
combinations of promoters, repressor or activator binding sites and/or repressor or activator 
genes from an ansamycin gene cluster, In particular from the rifamycin gene cluster, with 



foreign structural genes or combinations of structural genes from the ansamycin gene 
cluster, especially the rifamycin gene cluster, with foreign promoters; and combinations of 
structural genes with one another or with gene fragments which code for enzymatically 
active domains and are from' various ansamycin biosynthesis systems. Foreign structural 
genes, and foreign gene fragments coding for enzymatically active domains, code, for 
example, for proteins involved In the biosynthesis of other ansamycins. 

A preferred DNA fragment is one directly or indirectly involved in the gene cluster 
responsible for rifamycin synthesis. 

The gene cluster or DNA region described above contains, for example, the genes which 
code for the individual enzymes involved in the biosynthesis of ansamycins and, in 
particular, of rifamycin, and the regulatory elements which control the expression of the 
biosynthesis genes. The size of such antibiotic gene clusters extends from about 
10 kilobases (Kb) up to over 100 kb. The gene clusters normally comprise specific 
regulatory genes and genes for resistance of the producer organism to its own antibiotic. 
Examples of what is meant by enzymes or enzymatically active domains involved in this 
biosynthesis are those necessary for synthesizing, starting from 3-amino-5-hydroxybenzoic 
acid, the ansamycins such as rifamycin, for example polyketide synthases, 
acyltransferases. dehydratases, ketoreductases, acyl carrier proteins or ketoacyl synthases. 

Thus, the complete sequence of the gene cluster shown In SEQ ID NO 3, as well as DNA 
fragments which comprise sequence portions which code for a polyketide synthase or an 
enzymatically active domain thereof, are particularly preferred. Examples of such preferred 
DNA fragments are, for example, those which code for one or more of the proteins and 
polypeptides depicted in SEQ ID ID NOS 4, 5. 6, 7, 8 and 9, or functional derivatives 
thereof, also including partial sequences thereof which comprise, for example, 15 or more 
consecutive nucleotides. Other preferred embodiments relate to DNA regions of the gene 
cluster according to the invention or fragments thereof, like those present in the deposited 
clones pNE95. pRi44-2 and pNE112. or derived therefrom. Further preferred DNA 
fragments are those comprising sequence portions which display homologies with the 
sequences comprised by the clones pNE95. pRi44-2 and/or pNE112 or with SEQ ID ID 
NOS 1 and/or 3, and therefore can be used as hybridization probe within a genomic gene 
bank of an ansamycin-, in particular, rifamycin-producing organism for finding constituents 



of the corresponding gene cluster. The DNA fragment may moreover, for example, 
comprise exclusively genomic DNA. A particularly preferred DNA fragment is one which 
comprises the nucleotide sequence depicted in SEQ ID NO 1 or 3. or partial sequences 
thereof, which, by reason of homologies, can be regarded as structural or functional 
equivalent to said sequence or partial sequence therefrom, and which therefore are able to 
hybridize with this sequence. 

The DNA fragments according to the invention comprise, for example, sequence portions 
which comprise homologies with the above-described enzymes, enzyme domains or 
fragments thereof. 

The term homologies and structural and/or functional equivalents refers primarily to DNA 
and amino acid sequences with few or minimal differences between the relevant 
sequences. These differences may have very diverse causes. Thus, for example, this may 
entail mutations or strain-specific differences which occur naturally or are artificially induced. 
Or the differences observed from the initial sequence are derived from a targeted 
modification, which can be introduced, for example, during a chemical synthesis. 

Functional differences can be regarded as minimal if, for example, the nucleotide sequence 
coding for a polypeptide, or a protein sequence has essentially the same characteristic 
properties as the initial sequence, whether in respect of enzymatic activity, immunological 
reactivity or, in the case of a nucleotide sequence, gene regulation. 

Structural differences can be regarded as minimal as long as there is a significant overlap 
or similarity, between the various sequences, or they have at least similar physical 
properties. The latter include, for example, the electrophoretic mobility, chromatographic 
similarities, sedimentation coefficients, spectrophotometric properties etc. 

In the case of nucleotide sequences, the agreement should be at least 70%. but preferably 
80% and very particularly preferably 90% or more. In the case of the amino acid sequence, 
the corresponding figures are at least 50%, but preferably 60% and particularly preferably 
70%. 90% agreement is very particularly preferred. 
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The invention furthermore relates to a method for identifying, isolating and cloning one of 
the DNA fragments described above. A preferred method comprises, for example, the 
following steps: 

a) setting up of a genomic gene bank, 

b) screening of this gene bank with the assistance of the DNA sequences according to the 
invention, and 

c) isolation of the clones identified as positive. 

A general method for identifying DNA fragments involved in the biosynthesis of ansamycins 
comprises, for example, the following steps 

1) Cloning of a DNA fragment which shows homology with known polyketide synthase 
genes. 

a) The presence of DNA fragments having homology with the polyketide synthase genes 
according to the invention is detected in the strains of the microorganism to be 
investigated by a Southern experiment with chromosomal DNA of this strain. The size of 
such homologous DNA fragments can be determined by digesting the DNA with a 
suitable restriction enzyme. 

b) Production of a piasmid gene bank comprising the above digested chromosomal 
fragments. Normally, individual clones of this gene bank are tested once again for 
homology with the polyketide synthase genes according to the invention. Clones with 
recombinant plasmids comprising fragments having homology with the polyketide probe 
are then normally isolated on the basis of this homology. 

2) Analysis of the cloned region 

a) Restriction analysis of the isolated recombinant plasmids and checking of the identity 
of these cloned fragments with one another. 

b) By a chromosomal Southern with DNA of the original microorganism and the isolated 
DNA fragment as probe it can be demonstrated that the cloned fragment is an original 
chromosomal DNA fragment from the original microorganism. 

c) It is possible as an option to demonstrate a significant homology of the cloned DNA 
fragment with chromosomal DNA from other ansamycin producers (streptovaricin, 
tolypomycin, geldanamycin, ansamitocin). This would confirm that the cloned DNA is 
typical of gene clusters of ansamycin biosynthesis and thus also of rifamycin 
biosynthesis. 
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d) DNA sequencing of an internal restriction fragment and demonstration by comparative 
sequence analysis that tiie cloned region is a typical DNA sequence of polyketide 
synthases, coding for the biosynthesis of polyketide antibiotics from actinomycetes. 
3) Isolation and characterization of adjacent DNA regions 

a) Construction of a cosmid gene bank from the original microorganism and analysis 
thereof for homology with the isolated fragments. Isolation of cosmids having homology 
with this fragment. 

b) Demonstration by restriction analysis that the isolated cosmid clones comprise a DNA 
region of the original microorganism which overlaps with the original fragment. 

i 

As described above, the first step in the isolation of the DNA fragments according to the 
invention is normally the setting up of genomic gene banks from the organism of interest, 
which synthesize the required ansamycin, especially rifamycin. 
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Genomic DNA can be obtained from a host organism in various ways, for example by 
extraction from the nuclear fraction and purification of the extracted DNA by known 
methods. 

The fragmentation, which is necessary for setting up a representative gene bank, of the 
genomic DNA to be cloned to a size which is suitable for insertion into a cloning vector can 
take place either by mechanical shearing or else, preferably, by cutting with suitable 
restriction enzymes. 

Suitable cloning vectors, which are already in routine use for producing genomic gene 
libraries, comprise, for example, cosmid vectors, plasmid vectors or phage vectors. 

It is then possible in a screening program to obtain suitable clones which comprise the 
required gene(s) or gene fragment(s) from the gene libraries produced in this way. 

One possibility for identifying the required DNA region consists in, for example, using the 
gene bank described above to transform strains which, because of a blocked synthetic 
pathway, are unable to produce ansamycins, and identifying those clones which are again 
able after the transformation to produce ansamycin (revertants). The vectors which lead to 
revertants comprise a DNA fragment which is required in ansamycin synthesis. 
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Another possibility for identifying the required DNA region is based, for example, on using 
suitable probe molecules (DNA probe) which are obtained for example as described above. 
Various standard methods are available for identifying suitable clones, such as differential 
colony hybridization or plaque hybridization. 

It is possible to use as probe molecule a previously isolated DNA fragment from the same or 
a structurally related gene or gene cluster which, because of the homologies present, is 
able to hybridize with the corresponding sequence section within the required gene or gene 
cluster to be identified. Preferably used as probe molecule for the purpose of the present 
invention is a DNA fragment obtainable from a gene or a DNA sequence involved in the 
synthesis of polyketides such as ansamycins or soraphens. 

If the nucleotide sequence of the gene to be isolated, or at least parts of this sequence, are 
known, it is possible in an alternative embodiment to use. based on this sequence 
information, a corresponding synthesized DNA sequence for the hybridizations or PGR 
amplifications. 

In order to facilitate detectability of the required gene or else parts of a required gene, one 
of the DNA probe molecules described above can be labelled with a suitable, easily 
detectable group. A detectable group for the purpose of this invention means any material 
which has a particular, easily identifiable, physical or chemical property. 

Particular mention may be made at this point of enzymatically active groups such as 
enzymes, enzyme substrates, coenzymes and enzyme inhibitors, furthermore fluorescent 
and luminescent agents, chromophores and radioisotopes such as ^H, ^^S, ^^P. ^ ''I and C. 
Easy detectability of these markers is based, on the one hand, on their intrinsic physical 
properties (for example fluorescent markers, chromophores. radioisotopes) or, on the other 
hand, on their reaction and binding properties (for example enzymes, substrates, 
coenzymes, inhibitors). Materials of these types are already widely used in particular in 
immunoassays and, in most cases, can also be used in the present application. 

General methods relating to DNA hybridization are described, for example, by Maniatis T. et 
al., Molecular Cloning, Cold Spring Harbor Laboratory Press (1982). 



Those clones within the previously described gene libraries which are able to hybridize with 
a probe molecule and which can be identified by one of the abovementioned detection 
methods can then be further analysed in order to determine the extent and nature of the 
coding sequence in detail. 

An alternative method for identifying cloned genes is based on constructing a gene library 
consisting of plasmid or expression vectors. This entails, in analogy to the methods 
described previously, the genomic DNA comprising the required gene being initially isolated 
and then cloned into a suitable plasmid or expression vector. The gene libraries produced in 
this way can then be screened by suitable procedures, for example by use of 
complementation studies, and those clones which comprise the required gene or else at 
least a part of this gene as insert can be selected. 

It is thus possible with the aid of the methods described above to isolate a gene, several 
genes or a gene cluster which code for one or more particular gene products. 

For further characterization, the DNA sequences purified and isolated in the manner 
described above are subjected to restriction analysis and sequence analysis. 

For sequence analysis, the previously isolated DNA fragments are first fragmented using 
suitable restriction enzymes, and then cloned into suitable cloning vectors. In order to avoid 
mistakes in the sequencing, it is advantageous to sequence both DNA strands completely. 

Various alternatives are available for analysing the cloned DNA fragment in respect of its 
function within ansamycin biosynthesis. 

Thus, for example, it is possible in complementation experiments with defective mutants not 
only to establish involvement ip principle of a gene or gene fragment in secondary 
metabolite biosynthesis, but also to verify specifically the synthetic step in which said DNA 
fragment is involved. 



In an alternative type of analysis, evidence is obtained in exactly the opposite way. Transfer 
of plasmids which comprise DNA sections which have homologies with appropriate sections 
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on the genome results in integration of said tiomologous DNA sections via homologous 
recombination. If, as in the present case, the homologous DNA section is a region within an 
open reading frame of the gene cluster, plasmid integration results in inactivation of this 
gene by so-called gene disruption and, consequently, in an interruption in secondary 
metabolite production. It is assumed according to current knowledge that a homologous 
region which comprises at least 100 bp, but preferably more than 1000 bp, is sufficient to 
bring about the required recombination eve.nt. 

However, a homologous region which extends over a range of from 0.3 to 4 kb, but in 
particular over a range of from 1 to 3 kb, is preferred. 

To prepare suitable plasmids which have sufficient homology for integration via homologous 
recombination there is preferably provision of a subcloning step in which the previously 
isolated DNA is digested, and fragments of suitable size are isolated and subsequently 
cloned into a suitable plasmid. Examples of suitable plasmids are the plasmids generally 
used for genetic manipulations in streptomycetes or E. coll 

It is possible in principle to use for the preparation and multiplication of the previously 
described constructs all conventional cloning vectors such as plasmid or bacteriophage 
vectors as long as they have replication and control sequences derived from species 
compatible with the host cell. 

The cloning vector usually has an origin of replication plus specific genes which result in 
phenotypical selection features in the transformed host cell, in particular resistances to 
antibiotics. The transformed vectors can be selected on the basis of these phenotypical 
markers after transformation in a host cell. 

Selectable phenotypical markers which can be used for the purpose of this invention 
comprise, for example, without this representing a limitation of the subject-matter of the 
invention, resistances to thiostrepton, ampicillin. tetracycline, chloramphenicol, hygromycin, 
G418, kanamycin, neomycin and bleomycin. Another selectable marker can be, for 
example, prototrophy for particular amino acids. 
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Mainly preferred for the purpose of the present invention are streptomycetes and E. coli 
plasmids, for example the plasmids used for the purpose of the present invention. 

Host cells primarily suitable for the previously described cloning for the purpose of this 
invention are prokaryotes, including bacterial hosts such as streptomycetes, actinomycetes. 
E coli or pseudomonads. 

E. coli hosts are particularly preferred, for example the E. co// strain HB101 or X-1 blue 
MR*(Stratagene) or streptomyces such as the plasmid-free strains of Streptomyces lividans 
TK23andTK24. 

Competent cells of the E. co// strain HB101 are produced by the methods normally used for 
transforming E. colL The transformation method of Hopwood etal. (Genetic manipulation of 
streptomyces a laboratory manual. The John Innes Foundation, Norwich (1985)) is normally 
used for streptomyces. 

After transformation and subsequent Incubation on a suitable medium, the resulting 
colonies are subjected to a differential screening by plating out on selective media. It is then 
possible to Isolate the appropriate plasmid DNAfrom those colonies which comprise 
plasmids with DNA fragments cloned in. 

The DNA fragment according to the invention, which comprises a DNA region which is 
involved directly or indirectly in the biosynthesis of ansamycin and can be obtained in the 
previously described manner from the ansamycin biosynthesis gene cluster, can also be 
used as starter clone for identifying and isolating other adjacent DNA regions overlapping 
therewith from said gene cluster. 

This can be achieved, for example, by carrying out a so-called chromosome walking within 
a gene library consisting of DNA fragments with mutually overlapping DNA regions, using 
the previously Isolated DNA fragment or else, in particular, the sequences located at its 5" 
and 3' margins. The procedures for chromosome walking are known to the person skilled in 
this art. Details can be found, for example, in the publications by Smith et al. (Methods 
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Enzymol (1987), 151. 461-489) and Wahl et al, (Proc Natl. Acad. Sci. USA (1987). 84. 
2160-2164). 

The prerequisite for chromosome walking is the presence of clones having coherent DNA 
fragments which are as long as possible and mutually overlap within a gene library, and a 
suitable starter clone which comprises a fragment which is located in the vicinity or else, 
preferably, within the region to be analysed. If the exact location of the starter clone is 
unknown, the walking is preferably carried out in both directions. 

The actual walking step starts by using the identified and isolated starter clone as probe in 
one of the previously described hybridization reactions in order to detect adjacent clones 
which have regions overlapping with the starter clone. It is possible by hybridization analysis 
to establish which fragment projects furthest over the overlapping region. This is then used 
as starting clone for the 2nd walking step, in which case there is establishment of the 
fragment which overlaps with said 2nd clone in the same direction. Continuous progression 
in this manner on the chromosome results in a collection of overlapping DNA clones which 
cover a large DNA region. These can then, where appropriate after one or more subcloning 
steps, be ligated together by known methods to give a fragment which comprises parts or 
else, preferably all of the constituents essential for ansamycin biosynthesis. 

The hybridization reaction to establish clones with overlapping marginal regions preferably 
makes use not of the very large and unwieldy complete fragment but. in its place, a partial 
fragment from the left or right marginal region, which can be obtained by a subcloning step. 
Because of the smaller size of said partial fragment, the hybridization reaction results in 
fewer positive hybridization signals, so that the analytical effort is distinctly less than on use 
of the complete fragment. It is furthermore advisable to characterize the partial fragment in 
detail in order to preclude its comprising larger amounts of repetitive sequences, which may 
be distributed over the entire genome and thus would greatly impede a targeted sequence 
of walking steps. 

Since the gene cluster responsible for anisamycin biosynthesis covers a relatively large 
region of the genome, it may also be advantageous to carry out a so-called large-step 
walking or cosmid walking. It is possible in these cases, by using cosmid vectors which 
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permit the cloning of very large DNA fragments, to cover a very large DNA region, which 
may comprise up to 42 kb, in a single walking step. 

In one possible embodiment of the present invention, for example, to construct a cosmid 
gene bank from streptomycetes or actinomycetes. complete DNA is isolated with the size of 
the DNA fragments being of the order of about 100 kb. and is subsequently partially 
digested with suitable restriction endonucleases. 

The digested DNA is then extracted in a conventional way in order to remove endonuclease 
which is still present, and is precipitated and finally concentrated. The resulting fragment 
concentrate is then fractionated, for example by density gradient centrifugation, in 
accordance with the size of the individual fragments. After the fractions obtainable in this 
way have been dialysed they can be analysed on an agarose gel. The fractions which 
contain fragments of suitable size are pooled and concentrated for further processing. 
Fragments to be regarded as particularly suitable for the purpose of this invention have a 
size of the order of 30 kb to 42 kb, but preferably of 35 kb to 40 kb. 

In parallel with the fragmentation described above, or later, for example a suitable cosmid 
vector pWE15* (Stratagene) is completely digested with a suitable restriction enzyme, for 
example BamHI, for the subsequent ligase reaction. 

Ligation of the cosmid DNA to the streptomyces or actinomycetes fragments which have 
been fractionated according to their size can be carried out using a T4 DNA ligase. The 
ligation mixture obtainable in this way is, after a sufficient incubation time, packaged into X 
phages by generally known methods. 

The resulting phage particles are then used to infect a suitable host strain. A recA' E. coli 
strain Is preferred, such as E. co// HB101 or X-1 Blue* (Stratagene). Selection of" transf acted 
clones and isolation of the plasmid DNA can be carried out by generally known methods. 

The screening of the gene bank for DNA fragments which are involved in ansamycin 
biosynthesis is carried out, for example, using a specific hybridization probe which is 
assumed (for example on the basis of DNA sequence or DNA homology or 



-16- 



complementation tests or gene disruption or the function thereof in other organisms) to 
comprise DNA regions from the 'ansamycin gene cluster*. 

A plasmid which comprises an additional fragment of the required size or has been 
identified on the basis of hybridizations can then be isolated from the gel in the previously 
described manner. The identity of this additional fragment with the required fragment of the 
previously selected cosmid can then be confirmed by Southern transfer and hybridization. 

Function analysis of the DNA fragments isolated in this way can be carried out in a gene 
disruption experiment as described above. 

Another possible use of the DNA fragments according to the invention is to modify or 
inactivate enzymes or domains involved in ansamycin and, in particular, rifamycin 
biosynthesis, or to synthesize oligonucleotides which are then in turn used for finding 
homologous sequences in PGR amplification. 

Besides the DNA fragments according to the invention as such, also claimed are their use 
firstly for producing rifamycin, rifamycin analogues or precursors thereof, and for the 
bibsynthetic production of novel ansamycins or of precursors thereof. Included in this 
connection are those molecules in which the aliphatic bridge is connected only at one end 
to the aromatic nucleus. 

The DNA fragments according to the invention permit, for example, by combination with 
DNA fragments from other biosynthetic pathways or by inactivation or modification thereof, 
the biosynthesis of novel hybrid compounds, in particular of novel ansamycins or rifamycin 
analogues. The steps necessary for this are generally known and are described, for 
example, in Hopwood, Current Opinion in Biotechnol. (1993), 4, 531-537. 

The invention furthermore relates to the use of the DNA fragments according to the 
invention for carrying out the novel technology of combinatorial biosynthesis for the 
biosynthetic production of libraries of polyketide synthases based on the rifamycin and 
ansamycin biosynthesis genes. If, for example, several sets of modifications are produced, 
it is possible in this way to produce, by means of biosyntheses, a library of polyketides, for 
example ansamycins or rifamycin analogues, which then needs to be tested only for the 
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activity of the compounds produced in this way. The steps necessary for this are generally 
known and are described, for example, in Tsoi and Khosia, Chemistry & Biology (1995). 2, 
355-362 and WO-9508548. 

Besides the DNA fragment as such, also claimed is its use for the genetic construction of 
mutated actinomycetes strains from which the natural rifamycin or ansamycin biosynthesis 
gene cluster in the chromosome has been partly or completely deleted, and which can thus 
be used for expressing genetically modified ansamycin or rifamycin biosynthesis gene 
clusters. 

The invention furthermore relates to a hybrid vector which comprises at least one DNA 
fragment according to the invention, for example a promoter, a repressor or activator 
binding site, a repressor or activator gene, a structural gene, a terminator or a functional 
part thereof. The hybrid vector comprises, for example, an expression cassette which 
comprises a DNA fragment according to the invention which is able to express one or more 
proteins involved in ansamycin biosynthesis and, in particular in rifamycin biosynthesis, or a 
functional fragment thereof. The invention likewise relates to a host organism which 
comprises the hybrid vector described above. 

Suitable vectors representing the starting point of the hybrid vectors according to the 
invention, and suitable host organisms such as bacteria or yeast cells are generally known. 

The host organism can be transformed by generally customary methods such as by means 
of protoplasts, Ca^\ Cs\ polyethylene giycol. electroporation, viruses, lipid vesicles or a 
particle gun. The DNA fragments according to the invention may then be present both as 
extrachromosomal constituents in the host organism and integrated via suitable sequence 
sections into the chromosome of the host organism. 

The invention likewise relates to polyketide synthases which comprise the DNA fragments 
according to the invention, in particular those from Amycolatopsis medrferrane/ which are 
involved directly or indirectly in rifamycin synthesis, and functional constituents thereof, for 
example enzymatically active domains. 
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The invention furthermore relates to a tiybridization probe comprising a DNA fragment 
according to the invention, and to the use thereof, in particular for identifying DNA 
fragments Involved in the biosynthesis of ansamycins. 

In order to obtain unambiguous signals in the hybridization, DNA bound to the filter (for 
example made of nylon or nitrocellulose) is normally washed at 55-65°C In 0.2 x SSC (1 x 
SSC = 0.15 M sodium chloride, 15 mM sodium citrate). 

Examples 
General 

General molecular genetic techniques such as DNA Isolation and purification, restriction 
digestion of DNA, agarose gel electrophoresis of DNA, ligation of restriction fragments, 
cultivation and transformation of E. coli, plasmid isolation from E. coli, are carried out as 
described in Maniatis et al.. Molecular Cloning: A laboratory manual, 1st Edit. Cold Spring 
Harbor Laboratory Press, Cold Spring Harbor NY (1 982). 

Culture conditions and molecular genetic techniques with A. mediterranei and other 
actinomycetes are as described by Hopwood etaL (Genetic manipulation of streptomyces a 
laboratory manual. The John Innes Foundation. Norwich, 1985). All liquid cultures of 
A mediterranei and other actinomycetes are carried out in Erienmeyer flasks at 28*C on a 
shaker at 250 rpm. 

Nutrient media used: 

LB Maniatis et al.. Molecular Cloning: A laboratory manual, 1 st Edit. Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor NY (1982) 

NL148Schupp + Divers FEMS Microbiology Lett. 36, 159-162 (1986) (NL148 = NL148G 
without glycine) 

R2YE Hopwood et al. (Genetic manipulation of streptomyces a laboratory manual. The 
John Innes Foundation, Norwich, 1 985) 
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TB : 12 g/l Bacto tryptone 

24 g/l Bacto yeast extract 
4 ml/1 glycerol 

Example 1: Detection of chromosnmal DNA fr agments from A mediterranei having 

homology with oolvketide synthas e genes of other bacteria 
To obtain genomic DNA from A mediterranei, cells of the strain A mediterranei wt3136 
(= LBGA 3136, ETH collection of strains) are cultivated in NL148 medium for 48 hours. 1 ml 
of this culture is then transferred into 50 ml of NL148 medium (+ 2.5 g/l glycine) in a 200 ml 
Erienmeyer flask, and the culture is incubated for 48 h. The cells are removed from the 
medium by centrifugation at 3000 g for 10 min. and are resuspended in 5 ml of SET (75 mM 
NaCI. 25 mM EDTA, 20 mM Trie, pH 7.5). High molecular weight DNA is extracted by the 
method of Pospiech and Neumann (Trends in Genetics (1 995), 11 . 21 7-21 8). 

In order to detect, by a Southern blot, individual fragments from the isolated A. mediterranei 
DNA which have homology with polyketide synthase genes, a radioactive DNA probe is 
prepared from a known polyketide synthase gene cluster. To do this, the Pvul fragment 
3.8 kb in size is isolated from the recombinant plasmid p98/1 (Schupp et al. J. of Bacterid. 
(1995). 177, 3673-3679). which comprises a DNA region, about 32 kb in size, from the 
polyketide synthase for the antibiotic soraphen A. About 0.5 ^g of the Isolated 3.8 kb Pvul 
DNA fragment is radiolabelled with "P-d-CTP by the nick translation system from 
Gibco/BRL (Basle) in accordance with the manufacturer's instructions. 

For the Southern blot, about 2 \ig of the genomic DNA isolated above from A. mediterranei 
are completely digested with the restriction enzyme Bgill (Bohringer, Mannheim), and the 
resulting fragments are fractionated on a 0.8% agarose gel. A Southern blot with this 
agarose gel and the DNA probe isolated above (3.8 kb Pvul fragment) detects a DNA Bglll- 
cut fragment which is about 13 kb in size from the genomic DNA of A. mediterranei, and 
which has homology with the DNA probe used. It can be concluded on the basis of this 
homology that the detected DNA fragment from A. mediterranei is a genetic region which 
codes for a polyketide synthase and thus is involved In the synthesis of a polyketide 
antibiotic. 
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Fvam ple 2: Production of a specific recombinant olasm id collection comprising Bqlll- 

dioested chromosomal fragments from A. m eiditerranei \2AQ kb in size 
The E. coli positive selection vector plJ4642 (derivative of plJ666, Kieser & Melton, Gene 
(1988), 65, 83-91) developed at the John Innes Centre (Norwich, UK) is used to produce 
the plasmid gene bank. This plasmid is first cut with BamHI, and the two resulting fragments 
are fractionated on an agarose gel. The smaller of the two fragments is the filler fragment of 
the vector and the larger is the vector portion which, on self-ligation after deletion of the filler 
fragment, forms, owing to the flanking fd termination sequences, a perfect palindrome, 
which means that the plasmid cannot be obtained as such in E. coli. This vector portion 
3.8 kb in size is isolated from the agarose gel by electroelution as described on page ISA- 
IBS of Maniatis et al., Molecular Cloning: A laboratory manual, 1st Edit. Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor NY (1 982). 

To prepare the Bglll-cut DNA fragments from A mediterranei, the high molecular weight 
genomic DNA prepared in Example 1 is used. About 10 jig of this DNA are completely 
digested with the restriction enzyme Bglll and subsequently fractionated on a 0.8% agarose 
gel. DNA fragments with a size of about 1 2 - 16 kb are cut out of the gel and detached from 
the gel block by electroelution (see above). About 1 ^g of the Bglll fragments isolated in this 
way is ligated to about 0.1 ng of the BamHI portion, isolated above, of the vector p!J4642. 
The ligation mixture obtained in this way is then transformed into the E. constrain HB101 
(Stratagene). About 150 transformed colonies are selected from the transformation mixture 
on LB agar with 30 ng per ml chloramphenicol. These colonies contain recombinant 
plasmids with Bglll-cut genomic DNA fragments from A. mediterranei \n the size range 12 - 
16 kb. 

Example 3: Clonino and characterization of chromo somal A. mnditfirranei DNA 

fragments having homoloov with bacterial polvketide fsynthase genes 
150 of the plasmid clones prepared in Example 2 are analysed by colony hybridization 
using a nitrocellulose filter (Schleicher & Schuell) as described on pages 318-319 of 
Maniatis et a!., Molecular Cloning: A laboratory manual, 1st Edit. Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor NY (1982). The DNA probe used is the 3.8 kb Pvul 
fragment, radiolabelled with ^^P-d-CTP and isolated in Example 1, of the plasmid p98/1. The 
plasmids are isolated from 5 plasmid clones which show a hybridization signal, and are 
characterized by two restriction digestions with the enzymes Hindlll or KpnI. Hindlll cuts 
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twice in the vector portion of the clones, 0.3 kb to the right and left of the BamHI cleavage 
site into which the A mediterranei Dm has been integrated. Kpnl does not cut in the 
plJ 4642 vector portion. This restriction analysis shows that the investigated clones 
comprise both identical Hindlll fragments of about 14 and 3.1 kb and identical Kpnl 
fragments approximately 1 1 .4 kb and 5.7 kb in size. This shows that these clones comprise 
the same genomic Bglll fragment of A. mediterranei, and that the latter has a size of about 
13 kb. It can additionally be concluded from this restriction analysis that this cloned Bglll 
fragment has no internal Hindlll cleavage site, but has 2 Kpnl cleavage sites which afford 
an internal Kpnl fragment 5.7 kb in size. 

The plasmid DNA of the above 5 clones with identical restriction fragments is further 
characterized by a Southern blot. For this purpose, the plasmids are cut with Hindlll and 
Kpnl. and the DNA probe used is the ='=^P-radiolabelled 3.8 kb Pvul fragment of the plasmid 
p98/1 used above. This experiment confirms that the 5 plasmids contain identical 
A. mediterranei DHA fragments and that these have significant homology with the DNA 
probe which is characteristic of bacterial polyketide synthase genes. In addition, the 
Southern blot shows that the internal Kpnl fragment 5.7 kb in size likewise has significant 
homology with the DNA probe used. The plasmid called pRi7-3 is selected from the 5 
plasmids for further processing. 

To demonstrate that the cloned Bglll fragment about 13 kb in size from A. mediterranei \s an 
original chromosomal DNA fragment, another Southern blot is carried out. Chromosomal 
DNA from A. mediterranei which has been cut with Bglll. Kpnl or BamHI is employed in this 
blot. Two BamHI fragments which are about 1 .8 and 1 .9 kb in size and are present in the 
5.7 kb Kpnl fragment of pRi7-3 are used as radiolabelled DNA probe. This experiment 
confirms that the Bglll DNA fragment about 13 kb in size cloned in the recombinant plasmid 
pRi7-3 is an authentic genomic DNA fragment from A. mediterranei. In addition, this 
experiment confirms that the cloned fragment comprises an internal Kpnl fragment 5.7 kb in 
size and two BamHI fragments about 1 .8 and 1 .9 kb in size, and that these DNA fragments 
are likewise authentic genomic DNA fragments from A. mediterranei. 



-22- 



Examole 4: Demonstration of a significant ho mnlnav of the cloned genomic 1 3 kb Bqlll 
fragment from A meditfirranei mth rhromosomal DNA from other 
actinomvcetes which produce ansamvcins 
Demonstration of a significant homology between the cloned chromosomal DNA region of 
A mediterranei and chromosomal DNA from other ansamycin-producing actinomycetes 
takes place by a Southern blot experiment. The following ansamycin-producing strains are 
employed for this purpose (the ansamycins produced by the strains are in parentheses): 
Streptomyces spectabilis (streptovaricins). Streptomyces tolypophorus (tolypomycins), 
Streptomyces hygroscopicus (geldanamycins), Nocardia species ATCC31281 
(ansamitocins). Genomic DNA from these strains is isolated as described for A. mediterranei 
in Example 1 and digested with the restriction enzyme Kpnl, and the restriction fragments 
obtained in this way are fractionated on an agarose gel for the Southern blot. Two BamHI 
fragments about 1 .8 and 1 .9 kb in size from A. mediterranei, which are used in Example 3 
and are isolated from the plasmid pRi7-3. are used as radioactive probe. This experiment 
shows that these ansamycin-producing strains have a significant DNA homology with the 
DNA probe used and thus with the cloned chromosomal region of A. mediterranei. It is to be 
observed in this connection that the homology in the case of producers of ansamycins with a 
naphthoquinoid ring system (streptovaricin, tolypomycin) is greater than in the case of those 
with a benzoquinoid ring system (geldanamycin, ansamitocin). This result suggests that the 
cloned chromosomal DNA region from A. mediterranei is typical of ansamycin biosynthesis 
gene clusters and. especially, of gene clusters for ansamycins with naphthoquinoid ring 
systems, corresponding to the ring system in rifamycins. 

Example 5: DNA sequence determination of the Konl fragment 5.7 kb in siz e located 

within the cloned 13 kb Ball! fragment 
For the sequencing, the 5.7 kb Kpnl fragment is isolated from the plasmid pRi7-3 (DSM 
11114) (Maniatis et. al. 1992) and subcloned into the Kpnl cleavage site of the vector 
pBRKanf4, which is suitable for the DNA sequencing, affording the plasmids pTS004 and 
pTSOOS. The vector pBRKanf4 (derived from pBRKanfl ; Bhat. Gene (1993) 134, 83-87) is 
suitable for introducing sequential deletions of Sau3A fragments in the cloned insert 
fragment, because this vector does not itself have a GATC nucleotide sequence. In addition, 
the BamHI fragments 1.9 and 1.8 kb in size present in the 5.7 kb Kpnl fragment are 
subcloned into the BamHI cleavage site of pBRKanf4, resulting the plasmids pTS006 and 
pTS007, and pTSOOS and pTS009, respectively. 
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To prepare subclones sequentially truncated by Sau3A fragments for the DNA sequencing, 
the plasmids pTS004 to pTS009 are partially digested with Sau3A and completely digested 
with Xbal or HIndlll (a cleavage site in the multiple cloning region of the vector). The DNA 
obtained in this way (consisting of the linearized vector with inserted DNA fragments 
truncated by Sau3A fragments) is filled in at the ends using Klenow polymerase (fragment of 
polymerase I. see Maniatis et al. pages 113-114), self-ligated with T4 DNA ligase and 
transformed into E. coli DH5a. The plasmid DNA which corresponds to the pTS004 to 
pTS009 plasmids. but has DNA regions, which are truncated from one side-by Sau3A 
fragments, from the original integrated fragments of A mediterranei, is isolated from 
individual transformed clones obtained in this way. 

The DNA sequencing is carried out with the plasmids obtained in this way and with pTS004 
to pTS009 using the reaction kit from Perkin-Elmer/Applied Biosystems with dye-labelled 
terminator reagents (Kit N" 402122) and a universal primer or a T7 primer. A standard cycle 
sequencing protocol with a thermocycler (MJ Research DNA Engine Thermocycler, Model 
225) is used, and the sequencing reactions are analysed by the Applied Biosystems 
automatic DNA sequencer (Modell 373 or 377) in accordance with the manufacturer's 
instructions. To analyse the results, the following computer programs (software) are 
employed: Applied Biosystems DNA analysis software, Unix Solaris CDE software, DNA 
assembly and analysis package GAP licensed from R. Staden (Nucleic Acid Research 
(1995)23, 1406-1410) and Blast (NCBI). 

The methods described above can be used to sequence completely both DNA strands of the 
5.7 kb Kpnl fragment from A. mediterranei strain wt3136. The DNA sequence of the 5.7 kb 
fragment v/ith a length of 5676 base pairs is depicted in SEQ ID NO 1 . 

Example 6: Analvsis of the protein-encoding regio n foenes) on the 5.7 kb Kpnl 

fragment from A. mediterranei 
The nucleotide sequence of the 5.7 kb Kpnl fragment is analysed using the Codonpreference 
computer program (Genetics Computer Group, University of Wisconsin, 1994). This analysis 
shows that this fragment is over its whole length a protein-encoding region and thus forms 
part of a larger open reading frame (ORF). The codons used in this ORF are typical of 
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streptomycetes and actinomycetes genes. The amino acid sequence derived from the DNA 
sequence from this ORF is depicted in SEQ ID NO 2. 

Polyl<etide synthases for macrolide antibiotics (such as erythromycin, rapamycin) are very 
large multifunctional proteins which comprise several enzymatically active domains which are 
now well characterized (Hopwood und Khosia, Ciba Foundation Symposium (1992), 171, 88- 
112; Donadio and Katz, Gene (1992), 111, 51-60; Schwecke et al., Proc. Natl. Acad. Sci. 
U.S.A. (1995) 92 (17), 7839-7843). Comparison of the amino acid sequence depicted in SEQ 
ID NO 2 with that of the very well-characterized erythromycin polyketide synthase, eryA 
0RF1 (Donadio, Science, (1991) 252, 675-679, DNA sequence gene/EMBL accession NO 
M63676) gives the following results: 

Reoion from SEQ ID NO 2: amino acids 2 - 325 : is 40% identical to the acyltransferase 
domain of module 2 of the eryA locus of Saccharopolyspora erythraea. 

Region from SEQ ID NO 2: amino acids 325 - 470 : is 43% identical to the dehydratase 
domain of module 4 of the eryA locus of Saccharopolyspora erythraea. 

Region from SEQ ID NO 2: amino acids 762 - 940 : is 48% identical to the ketoreductase 
domain of module 2 of the eryA locus of Saccharopolyspora erythraea. 

Region from SEQ ID NO 2: amino acids 1024- 1109 : is 57% identical to the acyl carrier 
protein domain of module 2 of the eryA locus of Saccharopolyspora erythraea. 

Reoion from SEQ ID NO 2: amino acids 1126 - 1584 : is 59% identical to the ketoacyl 
synthase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 

The very large similarities found in the amino acid sequence and in the size and arrangement 
of the enzymatic domains suggest that the cloned Kpnl region 5.7 kb in size from 
A. mediterranei codes for part of a, polyketide synthase which is typical of polyketides of the 
macrolide type. 



Example 7: Construction of a cosmid gene bank from A. mediterranei 
The cosmid vector employed is the plasmid pWE15 which can be purchased (Stratagene, 
La Jolla, CA, USA). pWE15 is completely cut with the enzyme BamHI (Maniatis et al. 1989) 
and precipitated with ethanol. For ligation to the cosmid DNA, chromosomal DNA from 
A mediterranei \s isolated as described in Example 1 and partially digested with the 
restriction enzyme Sau3A (Bohringer, Mannheim) to form DNA fragments most of which 
have a size of 20 - 40 kb. The DNA pretreated in this way is fractionated by fragment size 
by centrifugation (83,000 g. 20°C) on a 10% to 40% sucrose density gradient for 18 h. The 
gradient is fractionated in 0.5 ml aliquots and dialysed, and samples of 10 ^l are analysed 
on a 0.3% agarose gel with DNA size standard. Fractions with chromosomal DNA 25 - 
40 kb in size are combined, precipitated with ethanol and resuspended in a small volume of 
water. 

Ligation of the cosmid DNA to the A. mediterranei Sau3A fragments isolated according to 
their size (see above) takes place with the aid of a T4-DNA ligase. About 3 jig of each of 
the two DNA starting materials are employed in a reaction volume of 20 ^1, and the ligation 
is carried out at 12''C for 15 h. 4 ml of this ligation mixture are packaged into lambda 
phages using the in vitro packaging kit which can be purchased from Stratagene (La Jolla, 
CA. USA) (in accordance with the manufacturer's instructions). The resulting phages are 

introduced by infection into the E. co// strain X-IBIueMR® (Stratagene). Titration of the 
phage material reveals about 20.000 phage particles per ml. analysis of 12 cosmid clones 
shows that all the clones contain plasmid DNA inserts 25 - 40 kb in size. 

Examples: Identification, cloning and characteri zation of the chromosomal 
A. mediterranei DHA region which is adjacent to the clo ned 5.7 kb Konl 
fragment 

To identify and clone the chromosomal A. mediterranei DNA region which is adjacent to the 
5.7 kb Kpnl fragment described above in Examples 3 and 5, firstly a radioactive DNA probe 
is prepared from this 5.7 kb Kpn! fragment. This is done by radioiabelling approximately 
0.5 iiQ of the isolated DNA fragment with ''P-d-CTP by the nick translation system of 
Gibco/BRL (Basle) in accordance with the manufacturer's instructions. 
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Infection of E: C0//X-I Blue MR (Stratagene) with an aliquot of the lambda phages 
packaged in vitro (see Example 7) results in more than 2000 clones on several LB + 
ampicillin (50 ng/ml) plates. These clones are tested by colony hybridization on 
nitrocellulose filters (see Example 3 for method). The DNA probe used is the 5.7 kb Kpnl 
DNA fragment from A. mediterranei which is radiolabelled with *'P-d-CTP and was prepared 
above. 

5 cosmid clones showing a significant signal with the DNA probe are found. The plasmid 
DNA of these cosmids is isolated (Sambrook et al., Molecular Cloning: A Laboratory 
Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. 1989), digested 
with Kpnl and analysed in an agarose gel. Analysis reveals that all 5 plasmids have 
integrated chromosomal A. mediterranei Dt^A with a size of the order of about 25-35 kb, 
and all contain the 5.7 kb Kpnl fragment. 

To characterize the chromosomal A. mediterranei DNA region which is adjacent to the 
cloned Kpnl fragment, the plasmid DNA of one of the 5 cosmid clones is subjected to 
restriction analysis. The selected plasmid of the cosmid clone has the number pNE1 12 and 
likewise comprises the 13 kb Bglll fragment described in Example 3. 

Digestion of the plasmid pNE112 with the restriction enzymes BamHI, Bglll. Hindlll 
(singularly and in combination) allows a restriction map of the cloned region of 
A. mediterranei to be prepared, and this permits this region about 26 kb in size in the 
chromosome of A mediterranei Xo be characterized. This region is characterized by the 
following restriction cleavage sites with the stated distance in kb from one end: BamHI in 
position 3.2 kb, Hindlll in position 6.6 kb, Bglll in position 11.5 kb, BamHI in position 
16.6 kb. BamHI in position 17.3 kb, BamHI in position 21 kb and Bglll in position 24 kb. 

Example 9: Determination of the sequence of the chro mosomal A maeliterranei DNA 
reoion present in the plasmid dNE112 and overlappi ng with the cloned 
5.7 kb Kpnl fragment 

The plasmid pNE112 DNA is split up into fragments directly using an Aero-Mist nebulizer 
(CIS-US Inc., Bedford, MA, USA) under a nitrogen pressure of 8-12 pounds per square 
inch. These random DNA fragments are treated with T4 DNA polymerase, T4 DNA kinase 
and E. coli DNA polymerase in the presence of the 4 dNTPs in order to generate blunt ends 
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on the double-stranded DNA fragments (Sambrook et al., Molecular Cloning: A Laboratory 
Manual, Cold Spring Harbor Laboratory Press. Cold Spring Harbor. NY, 1989). The 
fragments are then fractionated in 0.8% low melting agarose (FMC SeaPlaque Agarose, 
Catalogue N" 501 13), and fragments 1 .5-2 kb in size are extracted by hot phenol extraction 
(Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor. NY, 1989). The DNA fragments obtained In this way are then 
ligated with the aid of T4 DNA ligase to the plasmid vector pBRKanf4 (see Example 5) or 
pBlueScript KS+ (Stratagene, La Jolla, CA, USA), each of which is cut once with square 
ends by appropriate restriction digestion (Smal for pBRKanf4 and EcoRV for pBlueScript 
KS+), and is dephosphorylated on the ends by a treatment with alkaline phosphatase 
(Bohringer, Mannheim). The ligation mixture is then transformed into E. coli DH5a, and the 
cells are incubated overnight on LB agar with the appropriate antibiotic (kanamycin 40 jig/ml 
for pBRKanf4, ampicillin 100 ng/ml for pBlueScript KS+). Grown colonies are transferred 
singly into 1.25 mi of liquid TB medium with antibiotic in 96-well plates with welis of a 
volume of 2 ml, and incubated at 37'C overnight. Template DNA for the sequencing is 
prepared directly from these cultures by alkaline lysis (Birnboim, Methods in Enzymology 
(1983) 100, 243-255). The DNA sequencing takes place using the Perkin Elmer/Appied 
Biosystems reaction kit with dye-labelled terminator reagents (Kit N" 402122) and universal 
Ml 3 mp18/19 primers or T3, T7 primers, or with primers prepared by us which bind to 
internal sequences. A standard cycle sequencing protocol with 20 cycles is used with a 
thermocycler (MJ Research DNA Engine Thermocycler, Model 225). The sequencing 
reactions are precipitated with ethanol, resuspended in formamide loading buffer and 
fractionated and analysed by electrophoresis using the Applied Biosystems automatic DNA 
sequencer (Model 377) in accordance with the manufacturer's instructions. Sequence files 
are produced with the aid of the Applied Biosystems DNA Analysis Software computer 
program and transferred to a SUN UltraSpark computer for further analysis. The following 
computer programs (software) are employed for analysing the results: DNA assembly and 
analysis package GAP (Genetics Computer Group, University of Wisconsin, R. Staden, 
Cambridge University UK) and the four programs: Phred, Cross-match, Phrad and Consed 
(P. Green, University of Washington, B. Ewing and D. Gordon, Washington University in 
Saint Louis). After the original sequences have been connected together to give longer 
coherent sequences (contigs), missing DNA sections are specifically sequenced with the aid 
of new primers (binding to sequenced sections), or by longer sequencing or sequencing the 
other strand. 
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It is possible witin the method described above to sequence the entire chromosomal DNA 
region 26 kb in size from A. mediterranei \Nh\ch is cloned in pNE112. The DNA sequence is 
depicted in SEQ ID NO 3 in the base pair 27801 - 53789 section. The DNA sequence of the 
5.7 kb Kpnl fragment described in Example 5 is present in pNE1 12, and is depicted in 
SEQ ID NO 3 in the base pair 43093 - 48768 region. 

Example 10: Identification and characterization of cosmid clones w ith chromosomal DNA 
fragments from A. mediterranei which overlap with one e nd of the 26 kb A. 
mediterranei region of pNE112 
To identify cosmid clones which comprise chromosomal DNA fragments from 
A. mediterranei located directly in front of the 26 kb region of pNE1 1 2, the plasmid pNE1 1 2 
is cut with the restriction enzyme BamHI, and the resulting BamHI fragment 3.2 kb in size is 
separated from the other BamHI fragments in an agarose gel and isolated from the gel. This 
BamHI fragment is located at one end of the incorporated A. mediterranei DNA in pNE1 1 2 
(see Example 8) and can thus be used as DNA probe for finding the required cosmid 
clones. Approximately 0.5 ^g of the isolated 3.2 kb BamHI DNA fragment is radiolabelled 
with '*P-dCTP by the nick translation system from Gibco/BRL (Basel) in accordance with the 
manufacturer's instructions. 

The cosmid gene bank from A. mediterranei described in Example 7 is then analysed by 
colony hybridization (Method of Example 3) using this 3.2 kb DNA probe for clones with 
overlaps. Two cosmid clones with a strong hybridization signal can be identified in this way 
and are given the numbers pNE95 and pRi44-2. It Is possible by restriction analysis and 
Southern blot to confirm that the plasmids pNE95 and pRi44-2 comprise chromosomal DNA 
fragments from A. med/terrane/ which overlap with the 3.2 kb BamHI fragment from pNE1 12 
and together cover a 35 kb chromosomal region of A. mediterranei vjh\ch is directly adjacent 
to the 26 kb A. mediterranei fragment of pNE1 1 2 cloned in pNE1 1 2. 
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Examole 1 1 : Restriction analysis of the chromosomal A. mediterrane i DNA region cloned 

with the cosmid clones pNE112. pNE95 and DRi44-2 
The chromosomal A mediterranei DNA region cloned with the cosmid clones pNE1 12, 
pNE95 and pRi44-2 is characterized by carrying out a restriction analysis. Digestion of the 
plasmid DNA of the three cosmids with the restriction enzymes EcoRI, Bglll and Hindlll 
(singly and in combination) produces a rough restriction map of the cloned region of 
A. mediterranei. Overlapping fragments of the three plasmids are in this case established 
and confirmed by Southern blot. This chromosomal region of A. mediterranei has a size of 
about 61 kb and is characterized by the following restriction cleavage sites with the stated 
distance in kb from one end: EcoRI in position 7.2 kb, Hindlll in position 21 kb, Bglll in 
position 31 kb, Hindlll in position 42 kb, Bglll in position 47 kb and Bglll in position 59 kb. In 
this region in the A. mediterranei chromosome, the plasmid pRi 44-2 covers a region from 
position 1 to approximately 37 kb, plasmid pNE95 covers a region of approximate position 
9 kb - 51 kb and plasmid pNE 1 1 2 covers a region of approximate position 35 kb - 61 kb. 

Example 1 2: Determination of the sequence of the chromosoma l A. mediterranei DNA 
region described in Example 1 1 from the EcoRI cleavage site in the 7.2 kb 
position UP to the 61 kb end 
Determination of the DNA sequence of the chromosomal region described in Example 1 1 
from A. mediterranei (EcoRI cleavage site in the 7.2 kb position to 51 kb) is carried out with 
the plasmids pRi 44-2 and pNE95, using exactly the same method as described In Example 
9. Analysis of the DNA sequence obtained in this way confirms the rough restriction map 
described in Example 1 1 and the overlaps of the cloned A. mediterranei Uagments in the 
plasmids pNE112, pNE95 and pRi44-2. 

The DNA sequence of the chromosomal A. mediterranei DNA region described in Example 
1 1 from the EcoRI cleavage site in the 7.2 kb position up to the end at 61 kb is depicted in 
SEQ ID NO 3 (length 53789 base pairs). 

Example 13: Analvsis of a first protein-encodino region fORF A) of the cloned 

A. mediterranei chromosoma\ region depicted in SEQ ID NO 3 
The nucleotide sequence shown in SEC ID NO 3 is analysed with the Codonpreference 
computer program (Genetics Computer Group, University of Wisconsin, 1994). This analysis 
shows that a very large open reading frame (ORF A) which codes for a protein is present in 
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the first third of the sequence (position 1825 - 15543 including stop codon in SEQ ID NO 3). 
The codons used in ORF A are typical of actinomycetes genes with a high G+C content. 

Comparison of the amino acid sequence of ORF A (SEQ ID NO 4, size 4572 amino acids) 
with other polyketlde synthases and specifically with the very well characterized polyketide 
synthase of Saccharopolyspora erythraea (Donadlo. Science, (1991) 252, 675-679, DNA 
sequence gene/EMBL accession N' M63676) gives the following results: 

Reoion from ORF A. SEQ ID NO 4: amino acids 370 - 451 : is 50% identical to the acyl 
carrier protein domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Region from QRF A. SEQ ID NO 4: amino acids 469 - 889 : is 65% identical to the ketoacyl 
synthase domain of module 1 of the eryA locus of Saccharopolyspora eiythraea. 
Region from ORF A. SEQ ID NO 4: amipo acids 982 - 1292 : is 54% identical to the acyl- 
transferase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Region from ORF A. SEQ ID NO 4: amino acids 1 324 - 1442: is 42% identical to the 
dehydratase domain of module 4 of the eryA locus of Saccharopolyspora erythraea. 
Region from ORF A. SEQ ID NO 4: amino acids 1664 - 1840 : is 56% identical to the keto- 
reductase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Reoion from ORF A. SEQ ID NO 4: amino acids 1929 - 2000 : is 53% identical to the acyl 
carrier protein domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Region from ORF A. SEQ ID NO 4: amino acids 2032 - 2453: is 64% identical to the 
ketoacyl synthase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Region from ORF A. SEQ ID NO 4: amino acids 2554 - 2865 : is 37% identical to the acyl- 
transferase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Region from ORF A. SEQ ID NO 4: amino acids 2918 - 2991 : is 54% identical to the acyl 
carrier protein domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Region from ORF A. SEQ ID NO 4: amino acids 3009 - 3431 : is 65% identical to the 
ketoacyl synthase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Region from ORF A. SEQ ID NO 4: amino acids 3532 - 3847 : is 53% identical to the acyl- 
transferase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Reoion of ORF A. SEQ ID NO 4: amino acids 4142 - 4307 : is 43% Identical to the keto- 
reductase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Region of ORF A. SEQ ID NO 4: amino acids 4405 - 4490 : is 50% identical to the acyl 
carrier protein domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
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In addition to these significant homologies with the eryA poiyketide synthase of S. 
erythraea, the region of ORF A. SEQ ID NO 4: amino acids 1 - 356 is 53% identical to the 
postulated starter unit activation domain of the rapamycin poiyketide synthase from 
Streptomyces hygroscopicus (Aparicio et al. GENE (1996) 169. 9-16) 

The great similarities found in the amino acid sequence of the enzymatic domains suggest 
unambiguously that the protein-encoding region (ORF A) of the A mediterranai 
chromosomal region depicted in SEQ ID NO 3 codes for a typical modular (type 1) 
poiyketide synthase. This very large A. mediterranei poiyketide synthase encoded by 
ORF A comprises three complete bioactive modules which are each responsible for 
condensation of a 02 unit in the macrolide ring of the molecule and correct modification of 
the initially formed p-keto groups. Because of the homology with activating domains of the 
rapamycin poiyketide synthase, the first module described above very probably comprises 
an enzymatic domain for activating the aromatic starter unit of rifamycin biosynthesis, 3- 
amino-5-hydroxybenzoic acid (Ghisalba et al.. Biotechnology of Industrial Antibiotics 
Vandamme E. J. Ed., Decker Inc. New York. (1984) 281 -327). 

Example 14: Analysis of a second protein encoding region fORF B) of the cloned 
A, mediterranei chromosomal region depicted in SEQ ID NO 3 

The nucleotide sequence in SEQ ID NO 3 is analysed using the Codonpreference computer 
program (Genetics Computer Group, University of Wisconsin, 1994). This analysis shows 
that another large open reading frame (ORF B) which codes for a protein is present in the 
middle region of the sequence (position 15550 - 30759 including stop codon in SEQ ID 
NO 3). The codons used in ORF B are typical of actinomycetes genes with a high G+C 
content. 

Comparison of the amino acid sequence of ORF B (SEQ ID NO 5, length 5069 amino acids) 
with other poiyketide synthases and specifically with the very well characterized poiyketide 
synthase of Sacctiaropolyspora erytliraea (Donadio, Science. (1991) 252. 675-679, DNA . 
sequence gene/EMBL accession N** M63676) gives the following results: 
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ReQion of ORF B. SEQ ID NO 5: amino acids 44 - 468 : is 62% identical to the ketoacyl 

synthase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 

Region of ORF B. SEQ ID NO 5: amino acids 571 - 889 : is 56% identical to the acyl- 

transferase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 

Region of ORF B. SEQ ID NO 5: amino acids 921 - 1055 : Is 47% Identical to the 

dehydratase domain of module 4 of the eryA locus of Saccharopolyspora erythraea. 

Region of ORF B. SEQ ID NO 5: amino acids 1353 - 1525 : is 49% identical to the keto- 

reductase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 

Region of ORF B. SEQ ID NO 5: amino acids 1621 - 1706 : is 53% identical to the acyl 

carrier protein domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 

Region of ORF B. SEQ ID NO 5: amino acids 1726 - 2148 : is 62% identical to the ketoacyl 

synthase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 

Region of ORF B. SEQ ID NO 5: amino acids 2251 - 2560 : is 55% identical to the acyl- 

transferase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 

Region of ORF B. SEQ ID NO 5: amino acids 2961 - 3132 : is 49% identical to the keto- 

reductase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 

Region of ORF B. SEQ ID NO 5: amino acids 3228 - 3313 : is 52% identical to the acyl 

carrier protein domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 

Region of ORF B. SEQ ID NO 5: amino acids 3332 - 3755 : is 63% identical to the ketoacyl 

synthase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 

Region of ORF B. SEQ ID NO 5: amino acids 3857 - 4173 : is 52% identical to the acyl- 

transferase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 

Region of ORF B. SEQ ID NO 5: amino acids 4664 - 4799 : is 47% identical to the keto- 

reductase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 

Region of ORF B. SEQ ID NO 5: amino acids 4929 - 5014 : is 52% identical to the acyl 

carrier protein domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
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Example 15: Analysis of a third protein-encodina reg ion rORF C) of the cloned 
A. med/ferrane/ chromosomal reoion depicted in SEQ ID NO 3 

The nucleotide sequence in SEQ ID NO 3 is analysed using the Codonpreference computer 
program (Genetics Computer Group'. University of Wisconsin, 1994). This analysis shows 
that a large open reading frame (ORF 0) which codes for a protein is present in the middle 
region of the sequence (position 30895 - 36060 including stop codon in SEQ ID NO 3). The 
codons used in ORF C are typical of actinomycetes genes with a high G+C content. 

Comparison of the amino acid sequence of ORF C (SEQ ID NO 6, length 1721 amino acids) 
with other polyketide synthases and specifically with the very well characterized polyketide 
synthase from Saccharopolyspora erythraea (Donadio, Science, (1991) 252, 675-679, DNA 
sequence gene/EMBL accession N" M63676) gives the following results: 

Region of ORF C. SEQ ID NO 6: amino acids 1-414 : is 63% identical to the ketoacyl 
synthase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Region of ORF C. SEQ ID NO 6: amino acids 514 - 828 : is 54% identical to the acyl- 
transferase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Region of ORF C. SEQ ID NO 6: amino acids 1290 - 1399 : Is 49% identical to the keto- 
reductase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Region of ORF C. SEQ ID NO 6: amino acids 1563 - 1648 : is 55% identical to the acyl 
carrier protein domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 

Example 16: Analysis of a fourth protein-encodina re oion (ORF D^ of the cloned 
A. metf/terrane/ chromosomal region depicted in SEQ ID NO 3 

The nucleotide sequence in SEQ ID NO 3 is analysed using the Codonpreference computer 
program (Genetics Computer Group, University of Wisconsin, 1994). This analysis shows 
that a large open reading frame (ORF D) which codes for a protein is present in the middle 
region of the sequence (position 36259 - 41325 including stop codon in SEQ ID NO 3). The 
codons used in ORF D are typical of actinomycetes genes with a high G+C content. 

Comparison of the amino acid sequence of ORF D (SEQ ID NO 7, length 1688 amino 
acids) with other polyketide synthases and specifically with the very well characterized 
polyketide synthase from Saccharopolyspora erythraea (Donadio, Science, (1991) 252, 
675-679, DNA sequence genes/EMBL accession N" M63676) gives the following results: 
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Reoion of OPF D. SEQ ID NO 7: amino acids 1-418 : is 64% Identical to the ketoacyl 
synthase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Renion of QRF D. SEQ ID NO 7: amino anids 524 - 841 : is 54% identical to the acyl- 
transferase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Region of QRF D. SEQ ID NQ 7: amino acids 1260 - 1432 : is 51% identical to the keto- 
reductase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Re gion of QRF D. SEQ ID NQ 7: amino anids 1523 - 1608 : is 53% identical to the acyl 
carrier protein domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 

Examole 17: Analysis of a fifth protein-encod i ng region (QRF E) of the cloned 
A. mecf/Yerrane/ chromosomal region depicte d in SEQ ID NO 3 

The nucleotide sequence in SEQ ID NO 3 is analysed using the Codonpreference 
computer program (Genetics Computer Group. University of Wisconsin, 1994). This analysis 
shows that a large open reading frame (ORF E) which codes for a protein is present in the 
rear region of the sequence (position 41373 - 51614 including stop codon in SEQ ID NO 3). 
The codons used in ORF E are typical of actinomycetes genes with a high G+C content. 

Comparison of the amino acid sequence of ORF E (SEQ ID NO 8, length 3413 amino 
acids) with other polyketide synthases and specifically with the very well characterized 
polyketide synthase from Saccharopolyspora erythraea (Donadio. Science. (1991) 252, 
675-679, DNA sequence gene/EMBL accession N" M63676) gives the following results: 

Rfinion of ORF E. SEQ ID NO 8: amino acids 31 - 451 : is 64% identical to the ketoacyl 
synthase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Reoion of ORF E. SEQ ID NO 8: aminn anids 555 - 874 : is 37% identical to the acyl- 
transferase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Region of ORF E. SEQ ID NO 8: amin n arids 907 - 1036: is 49% identical to the 
dehydratase domain of module 4 of the eryA locus of Saccharopolyspora erythraea. 
Region of ORF E. SEQ ID NO 8: amino ariris 1336 - 1500 : is 52% identical to the keto- 
reductase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Re gion of ORF E. SEQ ID NO 8: amino ariris 1598 - 1683 : is 51% identical to the acyl 
carrier protein domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Re gion of ORF E. SEQ ID NO 8: amino acids 1702 - 2124 : is 62% identical to the ketoacyl 
synthase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
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Reaion of ORF E- SEQ ID NO 8: amino acids 2229 - 2543 : is 53% identical to the acyl- 
transferase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Rpnion of ORF E. SEQ ID NO 8: amip n arids 2573 - 2700: is 47% identical to the 
dehydratase domain of module 4 of the eryA locus of Saccharopolyspora erythraea. 
Region of ORF E. SEQ ID NO 8: amino anids 3054 - 3227 : is 52% identical to the keto- 
reductase domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 
Region of ORF E. SEQ ID NO 8: amino anids 3324 - 3405 : is 51% Identical to the acyl 
carrier protein domain of module 1 of the eryA locus of Saccharopolyspora erythraea. 

Example 18: Analysis of a sixth protein-encodi nn region (ORF F) of the cloned 
A. med/ferrane/ chromosomal region depicte d In SEQ ID NO 3 

The nucleotide sequence in SEQ ID NO 3 is analysed using the Codonpreference 
computer program (Genetics Computer Group, University of Wisconsin, 1994). This analysis 
shows that an open reading frame (ORF F) which codes for a protein is present in the rear 
region of the sequence (position 51713 - 52393 including stop codon in SEQ ID NO 3). The 
codons used in ORF F are typical of actinomycetes genes with a high G+G content. 

Comparison of the amino acid sequence of ORF F (SEQ ID NO 9, length 226 amino acids) 
with proteins from the EMBL databank (Heidelberg) shows a great similarity with the N- 
hydroxyarylamine 0-acyltransferase from Salmonella typhimurium (29% identity over a 
region of 134 amino acids). There is also significant homology with arylamine acyl- 
transferases from other organisms. It can be concluded from these agreements that the 
ORF F found in A. mediterranei in SEQ ID No 3 codes for an arylamine acyl transferase, 
and it can be assumed that this enzyme is responsible for the linkage of the long acyl chain 
produced by the polyketide synthase to the amino group on the starter molecule, 3-amino-5- 
hydroxybenzoic acid. This reaction would close the rifamycin ring system correctly after 
completion of the condensation steps by the polyketide synthase. 

Example 19: Summarizino assessment o f the function of the proteins encoded by ORF A - F 
in SEQ ID NO 3. and their role In the biosvnth esis of rifamvcin 

The five protein-encoding regions (ORF A-E). described in Examples 13 - 17, of SEQ ID NO 
3 comprise proteins with very great similarity (In the amino acid sequence and the 
arrangement of the enzymatic domains) to polyketide synthases for polyketides of the 
macrolide type. Taken together, these five multifunctional enzymes comprise 10 polyketide 
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synthase modules which are each responsible for a condensation step in the polyketide 
synthesis. 10 such condensation steps are likewise necessary for rifamycin biosynthesis 
(Ghisalba et al., Biotechnology of Industrial Antibiotics Vandamme E. J. Ed., Decker Inc. 
New York. (1984) 281-327). The processing of the particular keto groups required by the 
enzymatic domains within the modules substantially corresponds to the activity required by 
the rifamycin molecule, if It is assumed that the polyketide synthesis takes place "colinearly" 
with the arrangement of the modules in the gene cluster of A. mediterranei (this is so for 
other macrolide antibiotics such as erythromycin and rapamycin). It may be added here that 
it is not certain whether transcription of the five ORFs results in five proteins; in particular, 
ORF C and ORF D might possibly be translated to a large protein. 

An enzymatic domain which is very probably responsible for activating the starter molecule, 
3-hydroxy-5-aminoben2oic acid, of rifamycin biosynthesis can be found at the N terminus of 
ORF A, the start of the polyketide synthase. Directly below the described rifamycin 
polyketide synthase gene cluster there is a gene (ORF F) which very probably determines a 
protein which brings about ring closure of the rifamycin molecule after completion of the 
condensation steps by the polyketide synthase. 

It can be concluded on the basis of these findings that the A. med/ferrane/ chromosomal 
region described in SEQ ID NO 3 is responsible for the ten condensation steps required for 
rifamycin polyketide synthesis, including activation of the starter molecule 3-hydroxy-5- 
aminobenzoic acid, and the concluding ring closure. 



Deposited microorganisms 

The following microorganisms and plasmids have been deposited at the Deutsche 
Sammlung von Mikroorganismen und Zellkulturen GmbH (DSM), Mascheroder Weg lb, D- 
38124 Braunschweig, in accordance with the requirements of the Budapest Treaty. 

Microorganism/Plasmid Date of deposit Deposit number 

E. coli with plasmid pRi7-3 1 0.08.96 DSM 1 1 1 1 4 

E CO// with plasmidpNE1 12 14.07.97 DSM 11657 

E coli with plasmid pNE95 1 4.07.97 DSM 1 1 656 

E coli with plasmid pRi44-2 1 4.07.97 DSM 1 1 655 



